Infinitely Imbalanced Logistic Regression
نویسنده
چکیده
In binary classification problems it is common for the two classes to be imbalanced: one case is very rare compared to the other. In this paper we consider the infinitely imbalanced case where one class has a finite sample size and the other class’s sample size grows without bound. For logistic regression, the infinitely imbalanced case often has a useful solution. Under mild conditions, the intercept diverges as expected, but the rest of the coefficient vector approaches a non trivial and useful limit. That limit can be expressed in terms of exponential tilting and is the minimum of a convex objective function. The limiting form of logistic regression suggests a computational shortcut for fraud detection problems.
منابع مشابه
Mine Classification with Imbalanced Data
In binary classification problems it is common for the two classes to be imbalanced: one case is very rare compared to the other. Traditional classification approaches usually ignore this class imbalance, causing performance to suffer accordingly. In contrast, the algorithm infinitely imbalanced logistic regression (IILR) algorithm explicitly addresses class imbalance in its formulation. This p...
متن کاملWeighted logistic regression for large-scale imbalanced and rare events data
Latest developments in computing and technology, along with the availability of large amounts of raw data, have led to the development of many computational techniques and algorithms. Concerning binary data classification in particular, analysis of data containing rare events or disproportionate class distributions poses a great challenge to industry and to the machine learning community. Logis...
متن کاملDefault Prediction for Real Estate Companies with Imbalanced Dataset
When analyzing default predictions in real estate companies, the number of non-defaulted cases always greatly exceeds the defaulted ones, which creates the twoclass imbalance problem. This lowers the ability of prediction models to distinguish the default sample. In order to avoid this sample selection bias and to improve the prediction model, this paper applies a minority sample generation app...
متن کاملPredictive Data Mining for Highly Imbalanced Classification
The paper addresses some theoretical and practical aspects of data mining, focusing on predictive data mining, where two central types of prediction problems are discussed: classification and regression. Further accent is made on predictive data mining, where the time-stamped data greatly increase the dimensions and complexity of problem solving. The main goal is through processing of data (rec...
متن کاملSpam Sender Detection with Classification Modeling on Highly Imbalanced Mail Server Behavior Data
Unsolicited commercial or bulk emails or emails containing viruses pose a great threat to the utility of email communications. A recent solution for filtering is reputation systems that can assign a value of trust to each IP address sending email messages. By analyzing the query patterns of each node utilizing reputation information, reputation systems can calculate a reputation score for each ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 8 شماره
صفحات -
تاریخ انتشار 2007